Diagnostic evaluation of a personalized filtering information retrieval system. Methodology and experimental results

نویسنده

  • Christine Michel
چکیده

The study presented in this paper deals with the diagnostic evaluation of a system being implemented. The tested system’s particularity is to provide a filtering process taken into user’s account personal characteristics. The aim of diagnostic evaluation is to choose one filtering process between 8 proposed ones. 16300 interrogations are used as a representative sample. It combines characteristics relating to: the user’s profile, the user’s need of information and the filtering process. Answers are compared relating to: the number of common documents, the rank of common documents and the specificity degree of the query. These criteria give indication about the filtering impact. Introduction Hirschman et al (Hirshman 95) distinguish three evaluation types. The adequacy evaluation determines the fitness of a system for a purpose. The diagnostic evaluation is the production of a system performance profile with respect to some “taxonomisation” of the space of possible inputs. The software engineering teams also uses it to compare two generations of the same system (regression testing). The performance evaluation is the measure of the system performance in one or more specific area. It’s typically used to compare like with like between two alternative implementations of technology. In “information retrieval itself, a classic criterion is precision .../..., a measure is the percentage of document retrieved with are in fact relevant.../..., and a method for computing is to simply average over some number of test queries the ratio achieved by the system under test.” The TREC well-known experiments are performance evaluation. “One of the goal of TREC is to provide task evaluation that allows cross systems comparison which has proven to be the key strength in TREC. ../... The addition of secondary tasks (called tracks) in TREC-4 combined these strengths by creating a common evaluation for retrieval sub problems” (Voorhees 98). The methodology presented here is half a diagnostic and performance evaluation. It allows quick auto evaluation of information retrieval systems during the conception step. The test aim is to quantify the stability or the reactivity of the system submitted to different personalized filtering criteria. The tested system is often just a prototype so real users with personal information need can’t make direct interrogations. Those interrogations have to be simulated in laboratory. We consider the system as a black box submitted to different contexts of information. Protocols recommended in those cases are purely quantitative in order to have an exact control on the variables. Each particular component is isolated and observed on how it modifies the system’s answers. 1 It’s typically used by system developers, but sometimes offered to end-users as well. It usually requires the construction of a large and hopefully representative test suite. According the diagnostic evaluation, we have made a “taxonomy” of the possible input i.e. different types of users in search (users are represented by a specific profile and specific information need). Nevertheless it is also a performance evaluation because for the same systems we compare 8 different filtering algorithms by a performance criterion: the filtering impact i.e. the degree of similarity between the neutral answer without any filtering and the filtering answer. The system, tested in our experiment, presents answers in a ranked clustering way. In the first part of this paper we present which alternatives are proposed to experimenters in this case. In the second part we present the tested system: Profil-Doc, in the third part the experimental protocol and in the last part the results. 1 The ranked clustering answer as a problem in evaluation. Clustering process is used to improve the visibility of an information set. “Document clustering has long been investigated as a post retrieval document visualization techniques. Document clustering algorithms attempt to group documents together based on their similarities .../... This can help users both in location interesting document more easily and in getting an overview of the retrieved document set”. ”Information Retrieval community has long explored a number of post-retrieval document visualization techniques as alternatives to the ranked list presentation .../... : document networks, spring embeddings, documents clustering, and self organizing map. Of the four major techniques, only document clustering appears to be both fast enough and intuitive enough to require little training or adjustment time from the user.” (Zamir, 99) In our case, Profil-Doc via SPIRIT uses a clustering process and also ranks the different cluster by order of relevance. As Kantor said “Clusters of documents as clusters of terms represents concepts. While each document no doubt contains many concepts, the cluster will rank some concepts more highly” (Kantor, 94). The SPIRIT’s ranking method is presented in (Fluhr, 84). For example, the answer given by SPIRIT for the query “large-scale system evaluation” is composed of 104 documents into 12 clusters, ranked by order of relevance ( cf table 1). Cluster rank Cluster name Document reference Cluster document number 1 system-evaluation-large-scale docu462 1 2 system-evaluation, scale docu104 1 3 system, evaluation, large, scale docu264, docu457 2 4 evaluation, large, scale docu262, docu263 ; docu265 3 5 evaluation, large docu199 1 6 system, large, scale docu259, docu456, docu459 ... 5 7 system, large docu458 1 8 system, evaluation, scale docu36, docu29, docu317 .... 12 9 system, evaluation docu49, docu288, docu318 ... 4 10 large, scale docu261, docu463 2 11 evaluation, scale docu213, docu245 ... 7 12 system, scale docu230, docu211, docu196 ... 65 12 system, scale docu230, docu211, docu196 ... Table 1 : Example of ranked clustering answer According to Tague (Tague 95, Fricke 98) the documents' rank of presentation is one of the five 2 SPIRIT (Syntactic and Probabilistic Indexing and Retrieval Information System) is a commercial product of T.GID. Searches about SPIRIT are made according to the CEA -DIST (Commissariat à l’Energie Atomique Scientific and Technique Information Direction) – http://www.dist.cea.fr/ aspects to take into account to evaluate the quality of a SRI or of an information research center. Indeed "algorithmically ranked retrieval results become interpreted and assessed by users during session time. The judgment is in accordance with the users' dynamic and situational perceptions of a real or simulated information retrieval task" (Borlung 98). Tague quotes many studies that highlight the delay induced in the satisfaction of an informational need, induced by a possible modification of this order of presentation (Tague 96). Measures of Recall, Precision, Jaccard, Cosine, ... used in the case of collection test evaluations or comparative test protocols do not take the documents' order of presentation into account. They are the results of intersections or unions of the compared sets. In the trec_eval package of TREC 7 (Voorhees 98) report, we can find several adaptations of Recall and Precision, used when systems return a ranked list of documents. There is 85 numbers per run in the trec_eval package. For example P(10) is the precision after the first 10 documents are retrieved, P(30) is the precision after the first 30 documents are retrieved, R-Prec is the precision after the fist R documents are retrieved, where R is the number of relevant documents for the current topic, mean ave precision is the mean of average precision, R(1000) is the recall after 1000 documents are retrieved, rank first rel is the rank of the first relevant document retrieved. We can see in this example that several measures are roughly the same, they are varying according to the cut off level. A statistical study on 8 measures (Voorhees 98) shows that several are correlated i.e. measure the same things. This example notices the importance of a good and global evaluation measure . We propose (Michel 99) new mathematical methods and formalisms allowing us to build measures of proximity taking the documents' rank of presentation into account. We call them OS measures, i.e. measures of Ordered Similarity. The experiment presented in section 3 proposes an example of the OS measure. 2 The tested system Profil-Doc is a full-text information retrieval system made specifically for searcher from scientific and technical information fields. “Its aim is to carry out a pre orientation toward an information corpus restricted to user relevant information determined with the aid of utility criteria” (Laine 96). The pre-orientation system includes three fundamental operations: a characterization of the user’s profile relatively to four criteria : educational level (student, doctor, confirmed searcher) , disciplinary field (information science, mathematics, ...), search stage (state of the art, definition of a subject, experimentation, discussion ...) and type of search (specific or general) a segmentation of texts to be processed into part of text relatively to three criteria: the type of part (resume, introduction, experimentation, .), the discursive form of the part (argumentative, descriptive, ... ) and the format of presentation (text, equation, image, .). Characterized parts of texts are called “documentary units”. The description format of the database is a part of the system ; it’s so specific that it forbids us to use a classical large-scale test collection as in the TREC example. a filtering process that selects the useful parts for the identified user. 8 different filtering algorithms are under evaluation. From a specific user’s profile, the filtering process chooses the usefulness properties of the part of text. They made the extraction of a personalized corpus possible. “Once the “personalized” corpus has been defined, documentary software can be used to implement a classical search procedure to process user queries”(Laine, 96) The chosen documentary system is SPIRIT, a full-text and natural language querying system. As we already said, SPIRIT ranks answers texts in cluster in function of concepts of query it treats. The higher ranks are the more pertinent to the query clusters are. We work with the SPIRIT-W3 version, i.e. “SPIRIT databases could be carried through a standard browser.” (Fluhr 97). So the system is presented as a Web server. The tested prototype is composed of a 505 documentary units database, and a non-ergonomic interrogation interface i.e. the user can’t directly give his profile for the filtering process. He has to give manually the type of parts of text he wants. So, Profil-doc interrogations are composed of factual criteria defining properties of the useful part of text for the user’s profile and textual query defining user’s information need. 3 Evaluation Methodology As shown in the figure 1, the protocol is composed of 3 steps: First, as Hirschman (Hirschmann, 95) said, there is a “taxonomy” of the system inputs, i.e. the interrogations. We have to compare 8 different filtering algorithms. We characterize a context of interrogation by a specific user’s profile and a query. The system querying is made automatically, and answers are compared in order to evaluate the filtering impact. It’s the degree of similarity between answers obtained with a filtering process and without (i.e. neutral answers).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Interactive Search Elements in Digital Libraries

Background and Aim: Interaction in a digital library help users locating and accessing information and also assist them in creating knowledge, better perception, problem solving and recognition of dimension of resources. This paper tries to identify and introduce the components and elements that are used in interaction between user and system in search and retrieval of information in digital li...

متن کامل

Intelligent Approach for Attracting Churning Customers in Banking Industry Based on Collaborative Filtering

During the last years, increased competition among banks has caused many developments in banking experiences and technology, while leading to even more churning customers due to their desire of having the best services. Therefore, it is an extremely significant issue for the banks to identify churning customers and attract them to the banking system again. In order to tackle this issue, this pa...

متن کامل

Personalized Information Retrieval by Using Adaptive User Profiling and Collaborative Filtering

Many search engines such as Yahoo, Google, MSN, and AltaVista, have been developed to meet various users’ search needs in real world. In general, because of the lack of the personal information such as hobby, preferences, and interests, these existing information retrieval systems are unsuitable to provide personalized search results to users. In this paper, we propose an adaptive user profilin...

متن کامل

Review of ranked-based and unranked-based metrics for determining the effectiveness of search engines

Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017